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Title : METHOD FOR CONSTRUCTING A LIBRARY USING DMA SHUFFLING 



FIELD OF THE INVENTION 

The present invention relates to optimizing DNA sequence 
in-order to (a) improve the properties of a protein of interes 
by artificial generation of genetic diversity of genes encodin 
proteins having a biological activity of interest by the use c 
the so-called gene- or DNA shuffling technique to create 
large library of "genes", expressing said library of genes in 
suitable expression system and screening the expressed protein 
in. respect of specific characteristics to determine such pre 
tains exhibiting desired properties or (b) improve the proper 
ties of regulatory elements such as promoters or terninators b 
generation of a library of these elements, transforming suit 
able hosts therewith in operable conjunction with a structura 
gene, expressing said structural gene and screening for desir 
able properties in the regulatory element. 



BACKGROUND OF Tr.Z INVE:- : : IQ?: 

It is generally found that a protein performing a certai 
bioactivity ■exhibits'- a" certain' variation between genera -ar 
even between m.errbers' of the same species differences may. exist 
This variation is of course even more outspoken at the genon-i 
level. 

This natural- genetic diversity among genes coding f 
proteins having basically the , same bioactivity has been gene 
ated in Nature over billions of years and reflects a natur 
optimdzation of the proteins coded for in respect of the. env 
ro.nment of the organism in question. 
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■ In today's society. the donditions of life are vastly re- 
moved from the natural environment and it has been found that 
the naturally occurring bioactive molecules are not optimized 
for the various , uses to. which they are put by mankind, espe- 
cially when they are used for industrial purposes.: 

It has therefore been of interest- to^ industry to identify 
such bioactive proteins that exhibit optimal properties in re- 
spect of the use to which it is intended.. ' ' 

This has for many years been, done by /screening of natural 
sources, or by use of mutagenesis. For instance. Within the 
technical field of enzymes f.or- "use in e. g. / detergents, the 
washing and/or dishwashing performance of e.g. naturally occur- 
ring proteases,' lipases, amylases and ' cellulases- have been ir.- 
proved. significantly, by Itx vitro modifications of . the enzymes . 

In' most cases these .improvements, have been obtained by 
site-directed mutagenesis resulting in substitution, deletion 
or insertion of .specific amino acid residues which have been 
chosen either on the basis . of their type ' or . on the basis cf 
their, location in the. 'secondary or tertiary structure of the 
•mature ea-zyme (se^ for instance US patent no.' ^, 518, 534). . 

. In' this manner the preparation of. no.vel ' polypeptide vari- 
■ants' and mutants, . such, as novel modified enzymes, " with alterea 
.characteristics, e.g. specif ic' activity , substrate specificity, 
thermal, -pK and salt stability, pH,-optimum, pi, .K^, V^^gx etc., 
has successfully been per.formed to ' obtain polypeptides with im 

proved properties... . ... ' ., , 

..For instance, within .the technical field of enzymes, th 
washing and/or dishwashing performance of e.g. - proteases, li 
pases, amylases and cell'^lases have been . improved signifi 
cantly. 
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An alternative general approach for modifying proteins 
and enzymes has been based on random mutagenesis/ for instance, 
as disclosed in US 4,894,331 and WO 93/01285 

As it is a cumbersome and time consuming process to ob- 
5 tain polypeptide variants or mutants with improved - functional 
properties a few alternative methods for rapid preparation of 
modified polypeptides have been suggested. 

Weber et al., (1983), Nucleic Acids Research, vol. 11, 
5661-5661, describes a method for modifying genes by in vivo 
10 recombination between two homologous genes. A linear DNA se- 
quence comprising a piasmid veetor flanked by a DNA sequence 
encoding alpha-Inhuman interferon in the 5* -end and a DNA se- 
quence encoding aipha-2 human interferon in the 3 '.-end is con- 
structed and t'ransfected into a rec A positive strain of £. 
15 coll. Recombinants were identified and isolated using a resis- 
tance marker. 

Pompon et aJ., (1989), Gene 83, p. 15-24, describes a 
method for shuffling gene dor.ains of mamjnalian cytochrome P-450 
by in vivo recorr^Dinat lor* of partially homologous sequences in 

20 Saccharomyces cerevisiae by transf orrrang Saccharomyces cere- 
vlsiaa with a linearizec piasmid with filled-in- ends, and a D^iA 
fracT.ent being partially hor/ologous to the ends of said pias- 
mid. _ ^ . . - - - - - ^- - ■ - - - 

In ^WO 97/07205 a method is described whereby polypeptide 

25 variants are prepared by shuffling different nucleotide se- 
auences of- homologous DNA sequences by in vivo recomiDination 
using piasmid DNA as template. 

US patent no,. 5, 093,257 (Assignee: Genencor Int. Inc.) 
discloses -a method for prcc.icing hybrid polypeptides by in vivo 

30 recorrjDination. Hybrid DNA s-r/jences are produced by forming a 
circular vector comprising a replication sequence, a first CNA 
sequence encoding the aminc-terminal portion of the hybrid pc- 
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lypeptide, a second DNA sequence encoding the carboxy-terminal 
portion of said hybrid polypeptide. The circular Vector is 
transformed into a rec positive microorganism in which the cir- 
cular vector is amplified. This results in recombination c: 
said' circular vector mediated by the naturally occurring recon- 
bination mechanism of the rec positive microorganism, which • in- 
clude prokaryotes such as Bacillus and E. coli, and eukaryotes 
such as Saccharomyces cerevxsiae. 

one method for the shuffling of homologous DNA sequences 
has been described 'by' Stemmer '(Stemmer, (1994), Proc. Natl. 
Acad. Sci. USA, Vol. 91, 10747-10751; Stermer, ( 1994 ), Nature, 
vol 370, 389- 391). The method, concerns shuffling homologoiis 
DNa" sequences by' using in vitro PGR techniques. Positive, reccr.- 
•binant genes containing shuffled DNA sequences are. selectee 
from .a' DNA library based on the improved function of the ex- 
Dressed oroteins. 

^^y.r.r\ ^c, pi so described in WO 95/22625. WO 
The above methoa is ais^ u-^d-^x^^w 

r^^-o.^- -n- shufflinq of' homologous DSA 
95/22625 relates ^to a me.noc .o. sauii-xxny 

■ seouences. An important step >n the method' described in WO 

95/22625 is to cleave the homologous template double-strandea 

o^lynucleotide into rar.dox fragments of a desired size' followed 

.'by hoxologously reassembling of the fragments into full-lengtn 

. '"'"'a disadvantage innerent to the method of WO-95/22625 is, 
however,, that 'the diversity generated through that method is 
■.limited due to the use of homologous gene ' sequences (as defined 

in WO 95/22625).. 

Another disadvantage in the method of WO 95/22625 lies ir 
the production of the random fragments by the .cleavage of thi 
temoiate double-strandec polynucleotide. 

K further reference of interest is WO 95/17413 describin 
a method of gene or DNA shuffling by recombination of soecifi 
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DNA sequences - so-called design elements (DE) - either by re- 
combination of synthesized double-stranded fragments or recom- 
bination of PGR generated sequences to produce so-called func- 
tional elements (FE) comprising at least two of the design 
elements. According to the method described in WO 95/17413 the 
recombination has to be performed among design elem.ents that 
have DNA sequences with sufficient sequence homology to enable 
hybridization of the different sequences, to be recombined. 

WO 95/17413 therefore also entails the disadvantage that 
the diversity generated is relatively limited. Furthermore the 
methods described are time consuming, expensive, and not suited 
for automation. 

Despite the existence of the above methods there is stili 
a need for better iterative in vivo recombination methods for 
preparing novel polypeptide variants. Such m.ethods should alsc 
be capable of being performed in small volumes, and amenable to 
automation . 

Furthermore, there also is a need for' methods providing 
the possibility of being able to shuffle genes. with relatively 
low homology. 

SU.'-'M^RY OF THE IMVSNTIO:: 

■ The present . invention relates to; a method for the_ con- 
struction of a library of recoTi.ined polynucleotides from £ 
nu.-ber of different starting single or double stranded parental 
DN.^. templates, wherein said starting single or double stranded 
parental DN.=. templates represent discrete points in a popula- 
tion of genes encoding evolutionary or synthetic hom.ologues of 
a peptide having homologies ranging over a broad spectrum from 
less than 15% to more than SC-5, said population exhibiting at 
least one 'identification sequence, and whereby said genes are 

■ subjected to a gene shuffling procedure to generate shuffled 
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mutants- of said population of genes representing additional 
discrete points between those of said starting templates . 

The gene shuffling procedure to be used according to the 
invention can be any suitable method such as those described 
above or a procedure as described in our co-pending patent ap- 
plication filed on the same date, and outlined below. 

According to that procedure template shifts of newly syn- 
thesized DNA strands during in vitro DNA synthesis are utilized 
to. achieve DNA shuffling. 

In a further aspect the invention relates to a method of 
identifying polypeptides exhibiting improved properties in com- 
parison to naturally occurring .polypeptides of the same bioac- 
tivity, whereby a' library of recombined polynucleotides pro-^ 
duced by the above process are, cloned into an appropriate vec- 
-tor, said vector is' then transformed into a suitable host sys- 
tem, to be expressed into the corresponding polypeptides, said 
polypeptides are then screened in a suitable assay, and posi- 
tive results selected. 

In a still further aspect, the invention relates to a 
method for producing a polypeptide of interest as identified in 
■the preceding process, whereby a vector comprising a polynu- 
cleotide encoding said polypeptide is transformed into a suit- 
able host, said host is grown to" express said polypeptide, and 
the polypeptide recovered and purified. 

DEFINITIONS , . . 

Prior to discussing this invention in further detail, the 
following terms 'will first be defined. 

"Shuffling": The term "shuffling" herein means recoirJoina- 
tion of nucleotide sequence f ragTr;ent ( s ) between two or more 
polynucleotides resulting in output polynucleotides (i.e. 
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polynucleotides having been subjected to a -shuffling cycle) 
having a number of nucleotide fragments exchanged, in compari- 
son to the input polynucleotides (i.e. starting point polynu- 
cleotides) . 

"Homology of DNA sequences or polynucleotides" In the 
present context the degree of DNA sequence homology is deter- 
mined as the degree of identity between two sequences indicat- 
ing a derivation of the first sequence from the secor.d. The ho- 
mology may suitably be determined by means of computer prograr.s 
known in the art, such as GAP provided in the GCG program pac>- 
age (Program Manual for the Wisconsin ' Package, Version £, 
August 1994, Genetics Computer Group, 575, _ Science Drive, Madi- 
son, Wisconsin, USA 537 11 ) (Meedleman, S . B . and Wunsch, CD., 
(1970), Journal of Molecular Biology, 48, 443-453). 

"Homologous": The term "homologous" means that one s'in- 
gle-stranded nucleic acid sequence may hybridize to a comple- 
mentary single-stranded nucleic acid sequence. The degree cf 
hybridization may depend on a ' nuiri)er of factors including the 
amount of identity between the sequences -and the hybridization 
conditions such . as terr.oerature and salt concentration as dis- 
cussed later (vide infra). 

Using the coxputer. progran^. GAP (vide supra) with the fol- 
lowing ' settings for DNA sequence comparison: G.AP creation pen- 
alty of 5.0 and GA? extension penalty of 0.3, it is in the pre- 
sent context believed that .two DNA sequences will be able to 
hybridize (using low stringency hybridization conditions as de- 
fined 'below) if they mutually e.xhibit a degree of identity 
preferably of at least 70%, r.ore' preferably at least 80%, and 
even more preferably at least 85%. 

••heterologous": If two or more DNA sequences mutually ex- 
hibit a degree of identity which is less ■ than above . specif ied, 
they are in the present context said to be "heterologous". 
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"Hybridization:" Suitable experimental conditions for 
determining if' two or more DNA sequences of interest dz.' 
hybridize or not is herein defined as hybridization at lev 
stringency as described in detail below. 

Molecules to .which the oligonucleotide probe hybridizes 
under these, conditions are detected using a x-ray film or a 
phosphoimager . 

"primer": The term "primer" used herein especially ir. 
connection with a PGR reaction 'is an oligonucleotide 
(especially a "PCR-primer") defined and "constructed according ■' 
to general standard specification known in the' art ("PGR A 
practical approach" IRL Press, .(1991) I.' 

"A primer directed to a sequence:" The term "a primer di- 
rected to a sequence" means that the .primer (preferably to be 
used in a PGR. reaction) is c-onstructed to- exhibit at least 6Ci 
degree of sequence , identity tc- the . sequence part ' of interest, 
more preferably at least 90\ cegree of sequence identity to , the 
sequence part of interest, vr.ich ■ said primer consequently is 
"directed to". The primer is designed in; order to specifically 
anneal at the region' at a 'given temperature it is directed to- 
wards. Especially identity at the 3'' end of the primer is es- 
sential for the fup.cticr. cf, the polyn^.erase, i.e. the ability cf 
a polymerase to e:-.tend the annealed primer/ 

"Flanking" The term "flanking" used herein in connecticn 
with DNA sequences comprised in a PCR-fragment means the outer- 
most partial sequences of the .PCR-f ragraent, both in the. .and 

■3- ends of the PGR fragment. 

"Polypeptide" Polyrr.ers of amino acids sometimes referred 
to as protein. The sequer.tt- of amino acids determines the 
folded conform.ation that th- polypeptide assumes, and this in 
tufn determines biological properties such as activity. Some 
polypeptides consist of a single polypeptide chain 
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(monomeric) , whilst other comprise several associated polypep 
tides (multimeric) . All enzymes and antibodies are polypep 
tides. 

"Enzyme" A- protein capable of catalysing chemical reac 
tions. Specific types of enzymes are a) hydrolases includin 
amylases, cellulases and other carbohydrases, proteases, an^ 
lipases, b)- oxidoreductases, c) Ligases, d) Lyases, e 
Isomerases, f). Transferases, etc. Of specific interest in re- 
lation to the present invention are enzymes used in deter- 
gents, such as proteases, lipases, cellulases, amylases, etc. 



DETAILED DESCRIPTION OF- THE I^3VENTIQN 

All possible genes encoding a polypeptide of the sar.e 
evolutionary origin can be seen as a very large population - c: 
DNA sequences (e.g. {Gsp J set of genes encoding a serine 

protease}). It has been found that the homology between the 
polypeptides encoded by single r.e:?JDers of such , a population ma'., 
be even as small as less than 15% (the genes originating frc- 
"distant" organisms) . 

When searching for polypeptides suited for the various 
ourposes that mankind has developed^ it has been found diffi- 
cult, if not impossible az our present level of knowledge,_ t< 
conclude in a rational manner on the optimal configuration o 
the polypeptide in question. Therefore it was found desirabl 
to provide a simple method' of generating a sub-population' c 
the above mentioned very large • population, but representing 
substantial part of the variation possible within the larc 
population. 

The object of the present invention is thus to. provide, 
method, whereby it is possible to shuffle components of gen^ 
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encoding polypeptides ot the' same functionaUty,, but havin, 

only low homologies. 

TO this ■end it is necessary to obtain a reasonable knowl- 
edge of the population in question, meaning having at dxsposa. 

• . t. I ^ rr <=> 10, '15 or more mem- 

a number of -individual members (e.g. 

bers) representing as high a variation as possible. This smal. 
sub-population is then used as a starting point for generating 
a much larger sub-population of genes. The corresponding 
polypeptides of the large sub-population obtained a,e then dis- 
played and screened in ,an appropriate manner . to identify sucr. 
members of the' large sub-population that are. optimal for the 

intended purpose. 

■. , It was fou^d- that the. expansion .pf the^ starting sub- 

' ' ' , ' w ^^^Mi^^finn could be accomplishe:: 

population to the large sub-populacion coui ^ 

using gene shuffling methods. 

■ -such methods as described in the literature .provide .means 
to exchange- DNA fragments between genes coding for polypeptides 
of a reasonably high homology, typically to be above 80%, re- 
'sulting^ih the. generation of novel genes encoding polypeptides 
hav^na homoloaies between 80% and. 99%., 
■ .^'it W.S also found that . in the me.thod of the invention it 

T^^:^^ t-n us^- qenes encoding 
was necessary as/starting popula.io- to us- . g 

. y^p^r f-on' 70%: to 80% hdrr.ologous to a. 

polypeptides tnat are a.. leasL. --ou. u 

■least bne other gene-in the starting population. 

i t- i c; thus imoortaht to start. 
■ .. .■ "According to the invention it is .nus - . . ^ 

■ . . r,' "nones 'which comorises '; interm.e- 

fr-om a poDulation or suD-se.. ol genes wax .. 

.' .ciate sequences" .ranging fro. genes being ' rather similar to 
o.nos being rather dissimilar., but. still .having the same evolu- 
tionary origin (functioh). Only then a shuffling of even rather 
h-.e^ologous sequences is feasible. The stepwise shuffling o:, 
■ fi^st, quite homologous cenes creates new species which are no 
contained in the starting population, and which, in the subse 
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guent shuffling rounds, will recombine with each other and wiih 
other more heterologous genes from the starting population, a.-d 
so on. 

Finally, hybrids are generated in which sequence parts 
5 from very heterologous starting genes can be found. These re- 
trieved starting genes would have never been shuffled without 
having- the intermediate species in the starting population be- 
cause of a too large "sequence space" distance. 

Having this condition fulfilled it was found that it was 

10 possible to generate genes encoding novel functional polypep- 
tides having a homology as low .as the minimum degree of homol- 
ogy represented in- the starting population. In principle the 
homology range in the final population may be even greater thar. 
that for the starting population. 

15 The present invention relates in its first aspect to a. 

method for 'the construction of a library of recombined polynu- 
cleotides from a number of different starting single or double 
stranded parental DNA templates, wherein said starting single 
or double stranded parental DNA templates represent discrete 

20 points in a population of genes encoding evolutionary or syr:- 
thetic homologues of a peptide having homologies ranging over a 
broad spectrum fron less than 15% to -m.ore than 80%, said popu- 
lation exhibiting at least one identification sequence, ar.d 
whereby said genes are subjected to a gene shuffling procedure 

25 to generate shuffled mutants of said population of genes repre- 
senting additional discrete points between those of said start- 
ing templates . • 

According to the invention it is possible to use parental 
DNA templates representing hor.ologies ranging from less than 

30 45-5, 40%, 35%, 30%, 25%, 20^, or 15% to more than 80%, 85'*, 
90%", 95%, or 99%. 
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In specific embodiments at least, one identification se- 
quence is identified and primers constructed to anneal thereto. 
These, sequences can be located anywhere on the genes.. 

.. ., In a preferred embodiment at leasts ,two ■ identif icatior. 
sequences are identified. These sequences can be located at any 
distance from each other,, but it, Is preferred that they are lo- 
dated .as far as possible from each other on the genes ; 

According to these embodiments said identification se- 
quences may correspond to an .amino -acid sequence of from- 4 to 8 
amino acid residues, which sequence is highly conserved- among 
the peptides encoded by the collection of starting single or 
double, stranded' parental DNA. .templates, preferably from 5 to 7 
amino ■ acid residues. . ,' . ^ ' ■ . ' 

- - It is preferred that the ident.if ication sequences are lo- 
cated a distance, apart corresponding to the average size of. the - 
genes- in said .collection with' a ' variation of up to 40%. The; 
-longer apart the .sequences are the larger a part of the gene is 

shuffled. ■ ' - • • ' ' • ■ ■' ■ ■ 

■ .■■ .However, situations 'may arise, where it is desired' only 
to shuffle the sequences between * identif ication sequences- . lo,- 
cated quite close to each- other. . ■ .. - ' 

As indicated above the gene ' shuffling method used an the 
-'method o'f the invention is of • less or no significance... In' prin- 

■ cipl'e 'any method will work. • . 

Thus the methods disclosed ,in WO 95/22625 and WO' 95/17413 

■ are fully-, operable in -the' present invention.' Details . showing 
how these- m.ethods may be' used for practising the present inven- 
tion are indicated in the Examples below. , 

Therefore further gene shuffling methods described in co- 
filed patent applications are also contemplated for use in the 
oresent method. 
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According to one of these procedures template shifts cf 
newly synthesized DNA strands during in vitro DNA synthesis are 
utilized to achieve DNA shuffling. 

More specifically' that method provides for the construc- 
tion of a library of recombined homologous polynucleotides frc- 
a number of different starting single or double stranded paren- 
tal DNA templates and primers .by induced template shifts during 
an in vitro polynucleotide synthesis using a polynierase, 
whereby 

A. extended primers or polynucleotides are synthesized by 

. a) denaturing parental double stranded DNA templates zo 
produce' single stranded templates, 

b) annealing said primers to the single stranded DNA t£-- 
plates, 

c) extending said primers by initiating synthesis by use 
of said polymerase, ■ ' • 

d) cause arrest of the synthesis, and ^ . 

e) denaturing the double strand to separate the extended 
'prim.ers from the templates, . ' 

B. a template shift is induced by 

a) isolating the newly synthesized single stranded ex- 
tended primers frcr. the templates and repeating steps 
A.b) to A.e) using said extended primers produced in 
(A) as both prim.ers and templates, or 

b) repeating steps A.b) to A.e), 

C. the above process is terminated after an appropriate n-;:n- 
ber of cycles of process steps A. and B.a) , A. and B.b) , 
or com±Dinations thereof, and 

D. optionally the produce:: polynucleotides are am.plified in a 
standard PGR reaction v;iir. specific primers to selectivel 
am.piify polynucleotides cf interest./ 
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In a' further specific embodiment the gene shuffling is. 
performed by the method described in' our co-filed, application, 
Whereby conserved regions of ' heterologous ■ DNA sequences are 
ddentiified for shuffling of heterologous DNA sequences of in- 
terest having at least ' one conserved region comprising -the 
following steps;, • ; • . , 

i) One or -more conserved region(s) . (designated '-A, B,C" 
etc..) in'two or" more -of ^ the heterologous sequences are- iden- 
tified. " , / ' . ^ ' . r 

ii) Two sets of PGR primers .-(each set comprising a sense- and 
.an anti-sense, primer) for one or more conserved region{s) 
identified in (i) are , constructed . " . . . 

/ : In these primers, one set (nam.ed: "a"=sense ^ primer; 

/^a' "=anti-sense prim.er) ^ is .directed to a sequence region 5' 
(sense strain) of the conserved region (e.g. conserved regior. 
"A'*'K '.and the 'second set (nar.ed :.' "b"=sense primer; ;^b' "=anti- 
sense primer) is directed'' to a sequence region 3/- -.(sense, 
strain) of the ' conserved region (e.g: conserved region "A"."), 
and 'the antisense primer -a'" and the sense primer ^^b" have a 

■hon^ologous sequence overlap .of at 'least^ 10 base ■■pairs (bp) 
wit'nin the conserved , region . . ^ ' ■ 

iii) for one or more identified conserved region \of interest- 
in step .U) two PGR amplification reactions are performed us-^ 
ing the heterologous DNA sequences from step (i) as templates, 
whereby-rone of the PGR- react ions is - using^ the. 5'^, primer /set 
identified' in step (ii)" (e.g. named "a'S-^^a"') . and the second 
PGR reaction is using the 3'^ primer set identified in ■ step 

■ (ii) (e.g. named, "b" /'b; ") , , ' , ■ 

iv) . The PGR fragments generated as described in step (iii) 
for one or more of the identified conserved region in step 
(i) ; are isolated. 
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v) ' Two or more PGR fragments- isolated . from step (iv) and 
performance of a Sequence overlap extension PGR reaction (SOI- 
PCR) using the isolated PGR, fragments as templates are pooled. 

vi) The . PGR fragment (s) obtained in step (v) are isolated, 
whereby the isolated PGR fragment comprise numerous different 
shuffled sequences containing, a shuffled mixture of the PGR 
fragments isolated in step (iv) . ' , 

10 

In specific embodiments various modifications can be mace 
in the process of the invention.- For example it is advantageous 
to apply a defective polymerase either an error-prone po- 
lymerase 'to introduce mutations in comparison to the templates, 
15 or a polymerase that will 'discontinue the polynucleotide syn- 
. thesis prematurely to effect the arrest of the reaction: 

According to a specific err^odiment the peptide is a pro- 
tease, especially a. subtilase. 

It 

In the case, of a subpilase identification sequences may 
20 be located around the aspartic acid in position 32, or^ the his- 
tidine in position .64 and the active serine, in .position 221 cf 
subtilisin B?N' . ■ ' - ■ ' ' 

In a further' er±;C'di"ent the peptide ■ is an amylase,- espe- 
cially an a-amylase. • ■ 
2 5 fn that case of ident if fcation sequenc5es -may be located 
around the Asp in position lOO and -the Asp in position 328 of - 
. S.' iichenifor/::i5 ;a-amyla5e . . • 

For a-amylases _froT. Bacillus species the identification 
sequences ■ may preferentially be located around Tyr in- position 
30 8 and around Ser in position 476. 

In further embodiments the .peptide is a lipase, or a eel-' 
lulase. * . 
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. * ■ . " 1 ... 

In respect of lipases/ suitable identification sequences 
may be found by using the lipase alignment shown, in A. Svendsen 
et ai. (1995): Biochemical properties of cloned lipases frcj. 
the Pseudomonas family/ Biochimica et Biophysics Acta 1259 9- 
5 17. Examples could be around the Pro in position 10 or arou-d 
the His in position 285 (using F. glumae lipase numbering). 

In' respect of cellulases, in particular cellulases frcr. 
family 45 cellulases (see ■ WO 96/29397), suitable identification 

■ sequences may be the conserved region ""Thr-Arg Tyr Trp Asp Cys 
10'' Cys Lys Pro/Thr" and the conserved region "''Trp Arg Phe/Tyr 

Asp Trp Phe". For further details relating to those cellulase 
identification sequences reference is made to (PCT 
DK97/002l'6) . See in parricular'' in example. 3 of (PC? 
DK97/00216) . 

■ 15 ' In respect of xylanases, in particular xylanases frc:.. 
family 11 . xylanases, suitable identification sequences may be ' 
the conserved regions "DGGTYDIY" and "EGYQSSG". For further de- 
tails relating to those xyianase identification sequences ref- 
erence is made to (PCT DK97/0C216) See in' particular in exam- 
20 -pie 1,2 of -(PCT DK97./00'216) . 

PCR-orimers : 

The PGR primers are constiructed according to the standard 
descriptions in the ar t -'Normally they are 10-75 base-pairs 
^25 (bp), long. However, for the specific embodiment using random or 
semi-random primers the lencth may be substantially longer as 

■ 'indicated above. 

■ • PCR-reactions : 

■'30 If not otherwise mentioned the PCR-reaction performed ac- 

cording to the invention are performed according to standard 
^ ^protocols knovm in the art. 
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The term "Isolation of PGR fragment" is intended to cover 
as broad as simply an aliquot, containing the PGR fragment. How- 
ever preferably the PGR fragment is isolated to an extend which 
remove surplus of primers, nucleotides templates etc. . 
5. In an embodiment of the invention the DNA fragment (s) 

is (are) prepared under conditions resulting in a low, medium or 
high random mutagenesis frequency. 

To obtain low mutagenesis frequency the DNA sequence (s) 
(comprising the DNA fragment (s)) may be prepared by a standard 
10- PGR amplification method (US 4,683,202 or Saiki et al., (1985), 
Science 239, 487 - 491) . 

A medium or high mutagenesis frequency may be obtained by 
performing the PGR amplification under conditions which in- 
crease the misincorporat ion of nucleotides, for instance as ce- 
15 scribed by Deshler, (1992), GATA 9 ( 4 ), 103-106; Leung et al., 
(1989), Technique, Vol. 1, Ko. 1, 11-15. 

It is also conterr.piated according to the invention to 
combine the PGR amplification (i.e. according to this emboci- 
ment also DNA fragment nutation) with a mutagenesis step using 
20 a suitable physical or che-iical mutagenizing agent, e.g., one 
which induces transitions, transversions , inversions, scra.Tbl- 
ing, deletions, and/or insertions. 

Expressing the recorrbinant protein from the recombinant shuf- 

25 fled sequences 

Expression of the recombinant protein encoded by the 
shuffled, sequence in step vi) of the second and third aspect of 
the present invention may be performed by use of" standard ex- 
pression vectors and corresponding expression systems known in 

30 the art. 
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Screening and selection ' • . 

In the context of the present invention the term 
"positive polypeptide" variants" means resulting polypeptide 
• variants possessing functional properties which "has been i:;- 
5 proved in comparison to the polypeptides 'producible from the 
corresponding .input DNA sequences. Examples, of such improved 

■ properties can be as different as e.g. enhance or lowered bio- 
logical activity, increased wash performance, thermostability, 
oxidation stability, substrate specificity, antibiotic resis- 

10 tance etc. ■ . ' ,, ■. ' 

Consequently, • the screening •"method to be used' for identi- 
fying positive variants . depend on. .which property, of ' the 
polypeptide in question it is desired to change,, and in what 

• direction the change is desired. ; , 

is',' . A number of suitable s-creening or ' selection systems . lo 
'screen or select fbr a desired biological activity are de- 
scribed in the art. Examples are: 

Strauberg' et al.. (Biotechnology 13: 669-673 (1995) , de- 
■■ scribes a ' screening syste- for subtilisin variants , having .a 
20' Calcium-independent stability; , ■ ■ ■ ■ 

■ "; .Bryan ef al./ (Proteins 1:326- 334 ,(1986) ) describes , a ' 
■ screening assay for protease having enhanced . thermal -stability; 

and- . ■ ' ' " ■ '\ . \ 

PCT-DK95/00'322 describes' a screening assay for : lipases. 
25 having an improved wash performance in washing detergents. 

• An. erri)odiment of the invention comprise screening or .ser 
lection of recombinant proteinls) ; wherein the "desired biologi- 
cal activity is performance in dish-wash or 'laundry detergents. 

" Examples of suitable dish-w--r. ' or laundry detergents are dis- 
30 closed in PCT-DK96/00322 ar.c V.'S: 35/30011. 
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If the improved functional property of the polypeptide is 
not sufficiently good after one cycle of shuffling, the 
polypeptide may be subjected to another cycle. 

In an embodiment of the invention wherein polynucleotides 
5 representing a number of mutations of the same gene is used as 
templates at least one shuffling cycle is a backcrossing cycle 
with the initially used DNA fragment, which may be the wild- 
type "'~DNA""lErag1Seli^^ non-essential mutations . 

Non-essential mutations may also be eliminated by using wild- 
10 type DNA fragments as the initially used input DNA material. 

Also contemplated to be within the invention is polypep- 
' tides having biological activity such as insulin, ACTH, gluca- 
gon, somatostatin, somatotropin, thymosin, parathyroid horrr.one, 
pituary hormones, somatomedin, erythropoietin, luteinizing hor- 
15' mone, chorionic gonadotropin, hypothalamic - releasing factors, 
antidiuretic hormones, thyroid stimulating hormone, relaxin, 
interferon, thrombopoeit in (TPO) and prolactin. 

It is also contemplated according to the invention to 
shuffle parental poiynucleotices as indicated above originating 
20 from wild type organisms of different genera. 

The starting parental DUh sequences may be any DNA se- 
cuences including wild-iype DNA sequences, DNA sequences encod- 
ing variants or mutants, or modifications ^thereof, such as ex- 
tended or elongated' DNA sequences, and may also be the outcome 
25 of DNA sequences having been subjected to one or more^ cycles of 
shuffling (i.e. output DNA sequences) according to the method 
of the invention or any other method (e.g., any of the methods 
described in the prior art section) , or synthetic sequences or 
otherwise mutacenized sequences. 
30 When using the method of the - invention the resulting re- 

combined, polynucleotides ( i . e shuffled DNA sequences), have 
had a number of nucleotide fragments exchanged. This results in 
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•replacement of at least one amino acid within the polypeptide. 
'. variant, if comparing it with the parent polypeptide. It is to 

be understood that also silent exchanges are contemplated {i.e. 

nucleotide exchange which does ' not result in changes in the 
5 amino acid sequence) . 



■ MATERIALS AND METHODS " , , • 

10 EXAMPLES ■ ■ 

EXAJ-IPLE 1 ■ • ■ • ■ 

Shuffling . of a ' poOl/ponulation of evolutionar y homolocues 

originating from bacterial hosts. 

15 ■ ' ■ ■ . 

In this Example a gene shuffling method similar to the 

one described in WO 95/22625 is used: / 

'A population of subtilase-encoding genes or parts of such 
genes are generated through isolation or by synthesis. Sources 
20 for the genes may be as described ' Ip. Siezen et al . Protein Zr.- 
ginearing 4 1991 719-737. The population may also comprise 
cenes encoding the pre-pro subtilases as defined in-GenBank en- 
tries A1305b_l, .026542, A22550, Swiss-^Prof entry SUBT_BAC.-_M 
P00782,' and PD49B (Patent Application No. .WO 96/34953) with 
25- hc-ologies (similarities) ranging from 32% to 64%' as calculated 
by the MegAlign software from DN.^STAR Inc.' (WI 53715, USA) us- 
ing the Clustal Method. 

The substrates used in the shuffling reaction are repre- 
sented by linear double stranded DNA generated by PGR amplifi- 
30 cation using primers located at/directed towards, the ends c 
the DMA to be shuffled. In -his instance the primers can ccn 
veniently be constructed using the. sequences surrounding th 
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histidine in pos 64 of subtilisin BPN' and the serine in posi- 
tion 221 of subtilisin BPN' . The' template for this PGR can ei- 
ther be plasmids containing cloned protease genes or chromoso- 
mal DNA extracted from bacterial strains e.g. protease secret- 
5 ing bacteria isolated from soil. The substrate will typically 
be generated separately for all the templates and pooled before 
the shuffling reaction. 

The substrates are fragmented e.g.^ by ' DNAse I treatmer.c 
or shearing by sonication as described in WO 95/22625. The gen- 

10 erated fragments are separated according to size by agarose eel 
electrophoresis and generated -fragment of the desired size, 
e.g. from 10 to 50 bp. or froi?. 30 to 100 bp, or from 50 to 150 
bp, or from 100 to 200 bp are purified from the gel. 

These fragments are reassembled by PGR as described-^ in 

15 W095/22625. Optionally, correctly assembled DNA fragments are 
amplified by subjecting the product from the assembly reaction 
to another PGR including two primers able to anneal to . the ends 
of correctly assemjoled f rag:r.ent:s . The resulting fragments can 
be cloned into suitable expression plasmids, and subsequently 

20 screened for a specific property, such as thermostability using 
assays well known in the art.. 
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PATENT CLAIMS ' , 

1. A' -method for the construction of a library of recorabined 
polynucleotides from , a number of different starting single cr 

5' double stranded parental DNA templates, wherein said starting 
; % single or double ; stranded parental DNA templates represent dis- 
crete points in. a population of genes "encoding ' evolutionary cr 
■. synthetic, homologues, of a peptide having homologies ranging 
...J. -over/a broad spectrum ^ from less than 15%'to more than 80%, said 
10 population exhibiting, at least one identification sequence, and 
" whereby said • genes " are subjected to. a gene shuffling procedure 
: ^ ,. to generate shuffled 'mutants of, , said population of genes repre- . 
senting additior^^l discrete points between those of said starl- 
ing templates , ' ■ ' 

2. The method of claim 1, . wherein said homologies range fro-. 
'Mess than . 45%, 40%, 35%, 30%, 25%, 20%, -or 15% to more than 

■ '80%, 85%, ;90%^-95%, or' 99%. , 

,20 3. The method of claim 1 or 2, wherein said starting popula- 
tion exhibits af least two identification sequences. - 

■4. The method of any of the claims 1 to 3, wherein said 
- identification sequences -corresponds to amino acid sequences of 
25 from 4 to 8 amino 'acid residues, which sequence . is highly con-^ 
served among the peptides encoded by said . collect ion of start- 
v ing single or double," stranded . parental DNA templates,- prefera- 
bly from 5 to 7 amino acid residues. 



i5 
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5. The method of claim 3 or 4, wherein said identification 
sequences are located a distance .apart corresponding to the av- 
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erage size of the genes in said collection with a variation cf 
up to 40%.' 

6. The method of claim 3, wherein said variation is 20%, 
5 15%, 10%, or 5%. 

7. A method of identifying a polypeptide of interest exhib- 
iting improved properties in comparison to naturally occurrir.r 
or other known polypeptides of the same activity, whereby = 

10 population of recombined polynucleotides produced by a process 
according to any of the claims 1* to 6 are cloned into an appro- 
priate vector, said vector is . transformed into a suitable hosz 
system, to' be expressed into the corresponding polypeptides, 
and said polypeptides are screened in a suitable assay, ar.d 
'15 positive polypeptides selected. 

8. A method for producing a polypeptide of interest as iden- 
tified according to clairr. 7, whereby a vector comprising a 
polynucleotide encoding said identified polypeptide is trans- 

20 formed into a suitable host, said host is crown to express said 
polypeptide, and the polypeptide recovered and purified. 

9. The method of clairr. 3, wherein said peptide is a prote- 
ase, especially a subtilase. 

10. The method of claim 9, wherein said identification se- 
quences are located around the histidine in position 64. and the 
active serine in position 221 of subtilisin BPN' . 

11. The method of clairr; E, v;herein said peptide is an a.-y- 
lase, especially an a-anylase. 



25 
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12. The method of claim; 11, wherein said identification se- 
quences are located around the Asp in position 100 and the- As? 
in position 323 of B. licheniformis a-amylase. 

■5 ' ■ ' ^ ; ■ ■ ■ ■ ' . ■ ^ ■ .; ■ * 

. * 12. The method of claim 11, ^^wherein said identification se- 
quences are located' around the" Tyr in, position 8 and around Ser 
in position 476 of B, lichen! formis a-amylase. 

-10. 13. ■ The method of claim 8, wherein said peptide is a lipase. 

. . 14.' . The method^ of claim 13/ wherein saicl .identification se-' 
quences are located around .the Pro in 'position 10.^ and arour.d 
\ the His in position 285 of P. giuir.ae lipase. 

15 ; ■ ' , ■ "-' ■■ 

15.. The method of claim 3 wherein ■said' peptide is .a cellu- 
lase . , • 

. 15. ' The, method of;' claim S, whereih; said peptide is a xyla-. 



20' 



nase . 



1 



INTERNATIONAL SEARCH REPORT 


International application No. 




PCT/DK 98/00103 



A. CLASSIFICATION OF SUBJECT MA ITER 



IPC6: C12N 15/10, C12Q 1/68 // C12N 9/00 , , . ^ 

According t o Internilional Patent Cla^siricalion (IPC) or lo both nauonaJ cl^sificalion and IKC 

B. FIELDS SEARCHED [ ' 

Minimum documenution jearchcd (claisification jysicm followed by classification symbols) 

IPC6: C12N, C07K, C12Q 

Documenution searched other than minimum documentation to the extent that such documents a/e included in the fields searched 

SE.DK,FI,NO classes as above 

Elecuonic data base consulted during the international search (name of daU base and. where practicable, search terms used) 



C DOCUMENTS CONSIDERED TO BE RELEVANT 



Category' 


Citauon of document, with indication, where appropriate, of the relevant passages 


Relevant lo claim No. 


X 


WO 9522625 Al (AFJYMAX TECHNOLOGIES N.V.), 
24 August 1995 (24.08.95), page 9, 
line 1 - line 16; page 16, line 24 - line 29; 
page 78. line 16 - line 25, claims 


1-15 


P.X 


Nature, Volume 391. January 1993, 

Andreas Cramer i et al , "DMA shuffling of a family 
of genes from diverse species accelerates directed 
evolution" page 288 - page 291 


1-15 


P.X 


WO 9735966 Al (MAXYGEN, INC), 2 October 1997 
(02.10.97), figure 1 


1-15 



[ [ Further documenU arc listed in the ccntinua'Jor. of Boa C. X patent family annex. 



SpcdaJ ca'.egcntsof clcd dorxTicnU 

'A' (iocjmcni dcfirunx '^'^ gcncraJ rj:e of '.^c Art ^^jzh ;i no; zonziz: 

to be of paracuJ jj- rdevincc 
'E' trlicr doca-T.cnt but published on or aflc Cie intcmidoniJ' f:!:nj i 
'L' doc-.T.ml w>dcS rrny throiv iou-'.t on pncn'.y clijx(^) or \v!>::>i i: 

a ted to etubliOi Lhe pub!ici::cn di\c of ano'-^c quo on cr o'-'irr 

spcdaJ rcaion (it crifiei) 
'0* docjrr.CTt rcfcmnj to an oral di^clorjrc, uic. cxh;bidcn cr c--." 

mca-n^ 

'P' docjxcnl publii^ed p.no: to L^c intrm^^onal filing dJtc b'.:'. ia'.rr 
th; pricr.ty ijte clai.T.cd 



la'.r docjmcn; p'jblithed af\cr LSe intcmancr\al filing djte or pnont/ 
Ci'.i and not tn conflict y^'Jy the applicabcn bul aied to undcrr-ind 
L'.c pnnaplc or Ciecry undcTl>in j t.'ie invcnQcn 

docjmcii of pararjiif reltvince: Cie claimed invenoon cannot be 
ccr.::dcrcd novel or ca.-inoi be ccn-adcred to involve an mvcnnvc 
Erp \^hcn Lhc docimcnt ii Uxcn aJcnc 

d::cjrr.cnt of pa.'Tirjlar relevance: L^e cla;med invciacn cannot b< 
c~"dtTcd to involve an inven:ivc r.ep when docjrr.cnt \ \ 
crTT.bincd xv.'Ji one or more other rjch document'., ^uch combtnatjcn 
be:-g obvious lo a person tVjllcd in the art 

dDcurr.cnt member of the lame paten: f.vrjly 



Date of the aclual complciion oflhe international search 

30 June 1998 : 



Dale cfinailing oflhe inlcrnaUonal search report 

03 -07-- 1998 



Name and mailing address oflhe ISA/ 
Swedish Patent GfTice 
Box 5055. S-102 42 STOCKHOLM 
Facsimile No. + 46 S 666 02 86 



Au'.horizcd officer 

Patrick Andersson 

Tclcohone No. 46 8 7S2 25 00 



Form PCTylSA/210 (sccord shef.) (July 1992) 



INTERNATIONAL SEARCH REPORT 

Information on patenl family members 09/06/98 



InlemaUonaj application No. 

PCT/DK 98/00103 



Patent document 
cited in search report 



Publica:ion 
dAte 



Patent family 
mcniber(j) 



Publfcation 
date 



WO 9522625 Al 



24/08/95 



WO 9735966 Al 



02/10/97 



AU 
CA 
CN 
EP 
JP 
US 

AU 
AU 
WO 
AU 
WO 



2971495 
2182393 
1145641 
0752008 
10500561 
5605793 



233779.7 A 
2542697 A 
9735957 A 
1087397 A 
9720078 A 



04/09/95 
24/08/95 
19/03/97 
08/01/97 
20/01/98 
25/02/97 

17/10/97 
17/10/97 
02/10/97 
19/06/97 
05/06/97 



Form PCT/IS.A.HO (pi'.cr.t f.imily ar.ntx) (July 1992) 



